Rank in Wordlist | Word | Rank in Wordlist | Word |
---|---|---|---|
1 | جي | 26 | جنهن |
2 | ۾ | 27 | پر |
3 | ۽ | 28 | ئي |
4 | جو | 29 | اهو |
5 | آهي | 30 | آهي، |
6 | کي | 31 | ڪيو |
7 | ته | 32 | پنهنجي |
8 | تي | 33 | ويو |
9 | کان | 34 | يا |
10 | ان | 35 | ڪرڻ |
11 | به | 36 | سنڌي |
12 | سان | 37 | پوءِ |
13 | هو | 38 | ڪنهن |
14 | هن | 39 | انهن |
15 | هڪ | 40 | ٿيو |
16 | ڪري | 41 | پاڻ |
17 | نه | 42 | اسان |
18 | ٿي | 43 | وقت |
19 | آهن | 44 | ٿا |
20 | جا | 45 | جڏهن |
21 | سنڌ | 46 | هئي |
22 | ٿو | 47 | ڪئي |
23 | مان | 48 | واري |
24 | سندس | 49 | صاحب |
25 | لاءِ | 50 | پڻ |
The table shows the top-50 words of the corpus. Usually we see stopwords.
Language: Afrikaans
This list is a good candidate for a first stopword list for a language.
Usually a small, balanced corpus is enough to get a good list of high frequent words. But if the small corpus has some very prominent topic, this will be visible even in the top word lists.
select w_id-100 as rank_in_wordlist, word from words where w_id>100 order by w_id limit 50;
3.4 Sample words for different frequency ranges